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Abstract 

The interpretation of data in terms of multi-parameter models of new physics, using the Bayesian 
approach, requires the construction of multi-parameter priors. We propose a construction that 
uses elements of Bayesian reference analysis. Our idea is to initiate the chain of inference with the 
reference prior for a likelihood function that depends on a single parameter of interest that is a 
function of the parameters of the physics model. The reference posterior density of the parameter 
of interest induces on the parameter space of the physics model a class of posterior densities. We 
propose to continue the chain of inference with a particular density from this class, namely, the one 
for which indistinguishable models are equiprobable and use it as the prior for subsequent analysis. 
We illustrate our method by applying it to the constrained minimal supersymmetric Standard 
Model and two non- universal variants of it. 



I. INTRODUCTION 



With the start of the Large Hadron Collider (LHC) pQ , we have entered an era in which 
speculation about new physics has given way to detailed experimental study. This has 
had the welcome consequence of focusing attention on a difficult practical question: given 
the plethora of models of potential new physics, many depending on multiple unknown 
parameters, what is the best practical way to navigate the landscape of possibilities? This 
is a multi-faceted problem, of which undoubtedly the most challenging is devising reliable 
background estimates for all the final states that are being scrutinized. Another challenge 
is the construction of very fast accurate simulations [2] of new physics models at hundreds 
of thousands, even millions, of parameter points. This is necessary because, in general, 
the effective cross section, e{6)a{0) — that is, the signal efficiency, e(0), times cross section, 
a (8) — is a function of the parameters 9 of the model under investigation. 

In this Paper, we shall assume that both of these difficult tasks have been accomplished. 
Instead we address another important facet of the problem, namely, that of extracting in- 
formation about a given new physics model once LHC data become sufficiently abundant 
to test it. We propose a new method that is applicable to any multi-parameter model that 
yields a prediction about the expected signal count. We illustrate the method using three su- 
persymmetric (SUSY) models [3]: the constrained minimal supersymmetric Standard Model 
(CMSSM) [1] and two non-universal variants of it. 

The availability of increasingly powerful computers has made it possible to study multi- 
parameter models in a holistic manner. Indeed, it has become routine to use techniques such 
as Markov Chain Monte Carlo (MCMC) [5J to explore the mult i- dimensional parameter 
spaces of models such as SUSY [6]. This is another welcome development. Recent work 
on SUSY models [7] has shown that a holistic approach can yield qualitatively different 
conclusions from those arrived at using the traditional approach based on benchmarks [8]. 
SUSY models have been studied using both frequentist [5] and Bayesian [TU] methods. 
The frequentist studies typically construct confidence regions and obtain the best-fit point. 
Sometimes, information about individual parameters or pairs of parameters is obtained by 
projecting the likelihood function onto the parameters of interest. This procedure is actually 
a frequentist/Bayesian hybrid, which amounts to using a flat prior on the parameters. A 
conceptually more consistent, albeit approximate, frequentist approach is to construct a 
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profile likelihood [12HT3] for the parameter of interest. For example, if the parameter of 
interest is mo and /(mo, oS) ~ p(x\mo, oS) is the likelihood function for observations x, where u 
denotes the remaining parameters, the profile likelihood for m is lp(m ) ~ p(x\m , w(mo)), 
where a) (mo) is the best fit value of the parameters u for a given value of mo- The profile 
likelihood lp(mo) is then used as if it were a true likelihood. 

We propose to use the Bayesian approach [15] because of its strong theoretical founda- 
tions, its generality and the fact that it is conceptually straightforward: given a prior it (9) 
defined on the parameter space G of the model, where in general 9 is multi-dimensional, 
and a likelihood p(x\9), one computes the posterior density p(9\x) ~ p(x\9) tt(9) from which 
a myriad of details can be extracted such as point estimates or credible regions. It is also 
possible to make predictions about which data would be most useful to take next, and one 
can rank models according to their concordance with observations. Moreover, all manner of 
uncertainties, irrespective of their provenance and how we choose to label them — statistical, 
systematic, theoretical, best guess, etc. — can be accounted for in a conceptually coherent 
and unified manner. 

Every fully Bayesian analysis, however, must contend with the problem of constructing a 
prior tt(9) on the parameter space of the model under investigation. This task is especially 
difficult in circumstances in which intuition provides little guidance as is invariably the case 
for multi-parameter models. Current studies, which place flat or logarithmic priors on the 
parameters of new physics models, are sensitive to the choice of prior [10J; therefore, the 
choice of prior is a critical issue that must be squarely faced. This is the main purpose of 
this Paper. 

The current sensitivity of results to the prior is sometimes construed as an intrinsic 
difficulty with the Bayesian approach. In fact, the correct conclusion to be drawn is that 
it is not yet possible to place robust constraints on all the parameters of a typical multi- 
parameter model of new physics, a conclusion that is independent of the method used to 
extract information about the model be it frequentist or Bayesian. The difficulty is not that 
results are sensitive to the prior — this fact tells us something obvious and important: we need 
more data and better analyses. Rather the difficulty is that flat priors on multi-dimensional 
parameter spaces can lead to pathological results, which may not be apparent without a 
careful study. Flat priors have been used successfully, witness the recent discovery of single 
top quark production by D0 [TH] and CDF [T7] . But these results were obtained with a flat 
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prior applied to a single carefully chosen parameter, namely, the cross section [18J. 

Given that our multi-dimensional intuition may be unreliable, we are faced with a choice: 
either abandon the Bayesian approach — and, in our view, abandon an extremely power- 
ful set of ideas — or, as we propose, put intuition aside and use a formal procedure with 
mathematically verifiable properties to place priors on the parameter spaces. We propose a 
solution inspired by a set of Bayesian methods called reference analysis [T9H2T] , whose key 
construct is the reference prior. 

We advocate the use of reference priors because they lead to inferences with useful proper- 
ties, including invariance under one-to-one transformations of the parameters and excellent 
frequentist coverage. The latter property means that the (Bayesian) credible regions are 
also approximate (frequentist) confidence regions. Moreover, the reference prior can be 
perturbed in a controlled way to check the robustness of conclusions. 

Having initiated the inference chain with a reference prior, we can use Bayesian methods 

to 

• quantify the statistical significance of a signal, 

• rank models according to their concordance with observations, 

• estimate model parameters, and 

• design an optimal analysis for a given model and a given integrated luminosity. 

In this Paper, in addition to the main task of constructing multi-parameter priors, we address 
the first two points — the statistical significance of a signal and model ranking — and we defer 
consideration of the last two to a future publication. 

Bayesian reference analysis [T9H2T] provides a principled way to approach the problem 
of multi-parameter priors. However, while the solution it proposes is computationally fea- 
sible for one-parameter problems, it rapidly becomes computationally prohibitive for multi- 
parameter problems using current algorithms. Since the 1-parameter problem is a well- 
understood, solved, problem, our proposed solution begins with the solution of a 1-parameter 
problem and proceeds to the multi-parameter problem by imposing two requirements on the 
multi-parameter prior: consistency and equiprobability, both of which are described in detail 
below. 

Our solution proceeds in four steps: 
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1. first, we compute the marginal likelihood by integrating the likelihood function with 
respect to an evidence-based prior over all parameters except the parameter of interest; 

2. next, we compute the reference prior associated with the marginal likelihood; 

3. then, we compute the reference posterior density for the parameter of interest, 

4. and, finally, we map the reference posterior density to a posterior density on the 
parameter space of each multi-parameter model under study. 

Clearly, these steps can be applied to any experiment that has a single parameter of interest. 
In this Paper, we apply the steps to a single count experiment because it yields the simplest 
possible analysis and the key calculations can be done exactly. In the following sections, we 
describe the single count model, its reference prior, and our method for mapping the signal 
posterior density to the parameter space of a given multi-parameter model. 

The Paper is organized as follows. In Sec. [TTJ we give a detailed description of the single 
count model and its associated reference prior. Our construction of multi-parameter priors 



is described in Sec. |III| In Sec. [TV] we illustrate the method using three SUSY models, a 
2-parameter CMSSM and two 5-parameter non-universal generalizations. We end with a 
summary and concluding remarks. 



II. THE SINGLE COUNT MODEL 



In the context of the LHC, the single count model describes the results of a "cut and 
count" analysis in which N proton-proton collision events are found to pass a given set of 
selection criteria, that is, cuts. The expected number of events, n, is given by 

n = n + s, (1) 

where /i is the expected number of Standard Model background events and s > — assumed 
to be purely additive — is the expected number of signal events due to (unknown) new physics. 
The observed count is denoted by iV and the expected (that is, mean) count is denoted by 
n. We shall use upper case letters for observed values and lower case letters for expected 
values. 

The result of any experiment can be encoded in its likelihood function, the probability 
density function (pdf ) of the observations (sometimes called the probability mass function 
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if the data are discrete) evaluated at the actual observations. From the likelihood function 
and the prior density for the expected signal and background we can compute the posterior 
probability Pr(s|iV) = p(s\N)ds of the signal, that is, the probability that the expected 
signal lies in the interval 5 = (s, s + ds), given the observed count N. 

We choose to parametrize the likelihood in terms of the expected signal s rather than the 
cross section a, as is done in Ref. [21], so that the results of the counting experiment remain 
independent of the new physics model. The cuts may have been motivated by a specific 
model of new physics, however, the signal posterior density can be interpreted using any 
physics model that makes predictions for the expected signal in the final states considered. 
Moreover, as we shall see, we can devise a purely Bayesian measure of the degree to which 
the observation of N events favors the hypothesis s > rather than the background-only 
hypothesis s — 0, independently of any presumed model of new physics. Moreover, this can 
be readily generalized to a multi-count analysis. 

For a counting experiment that yields N events, we make the usual assumption that the 
likelihood function is given by a Poisson distribution, 

p(N\fi,s) = Poisson(iV|// + s), (2) 

with mean fi + s. The associated 2-parameter prior, 7r(/z, s), can be factorized in two ways, 

7r(/x, s) = 7r(s|/i) 7r(/i), Method 1 (3) 

vr(/i, s) = vr(/i|s) vr(s), Method 2, (4) 



both of which were considered in Ref. [21]. Here, we consider Method 2 only. We do so 
because we can reduce the likelihood function p(N\fi, s) to a function of the single parameter 
s through marginalization, 

poo 

p(N\s)= / p(N\ti, s) 7r(//|s) d/i, (5) 



which permits the application of the 1-parameter reference prior algorithm |21j to compute 
the reference prior for the expected signal, while avoiding the technical issue of nested 
compact sets [2"Tj . 

Following Ref. [21], we model the evidence-based prior 7r(/x|s) for the expected background 
by a gamma density, 

b(bu) Y ~ l/2 

*fr\s) = *(a0 = r ( y + i /2) e ( 6 ) 
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where b and Y are known constants. We further assume that the prior is independent of the 
expected signal, s. (See Appendix [A] for its derivation.) Then, we integrate over [i to arrive 
at the 1-parameter marginal likelihood, 



p(N\s) 



p(N | /I, s) 7r(/i) dfj,, 

N\ ' T(Y + 1/2) 



-b/j, 



dp, 



where 



y+i n 

-UjVfc Poisson(A;|s) 

fc=0 

r(r + i + 2-A;) 



6 + 1 



r(y + |) (z - fe)! 



6 + 1 



i—k 



(7) 



for the expected signal, s, whose reference prior, vr(s), is calculated in the next section. 



A. Reference Priors 



When we know almost nothing about a potential signal it seems prudent to use a prior 
for the expected signal that is as noncommittal as possible. The approach in high energy 
physics has been to use a flat prior [T5] for a parameter about which little is known, or 
for which one wishes to act as if that is the case. But, for multi-parameter models, our 
intuition is ill-equipped to choose the parameterization in terms of which the prior is flat. 
We therefore propose a different approach. Our idea is to construct a prior for each new 
physics model starting with the reference prior for an experiment with a single parameter 
of interest — here the expected signal, s, for a single count experiment. By construction, a 
reference prior [T9H2T] , on average and given unlimited data, maximizes the influence of the 
data relative to the prior. 

The intuition that underlies the construction of such priors is that the influence of the 
observations will be greatest if the "separation" between the posterior density and the prior 
is as large as possible. Reference analysis [22] quantifies the separation between the two den- 
sities p(s\N) and tt(s) using the Kullback-Leibler (KL), divergence, which for the particular 
problem we address is given by 

D[n,p] = [ p(s\N) In^j^ds. (8) 
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This non-negative quantity, which is invariant under one-to-one transformations of s and 
zero if and only if the densities p(s\N) and 7r(s) are identical, may also be interpreted as a 
measure of the information gained from the (single count) experiment. 

Since we wish to maximize the influence of the observations, we might be tempted to 
maximize Eq. ^ with respect to the prior, ir(s). This, however, would be unsatisfactory 
because the prior would then depend on the specific observations, which would enter the 
posterior density twice: once in the prior and once in the likelihood. It is more satisfactory 
to use the average of D[tt,p] over all possible observations. Integration over the space of 
observations — standard practice in the frequentist approach — may seem a decidedly un- 
Bayesian thing to do. However, the likelihood principle [26], the idea that inferences should 
be based on the observed data only, makes sense only if we actually have observations. 
Obviously, before we perform the analysis, we do not know the value of the count N; 
therefore, since the count is unknown we should average over all possible realizations of N. 
Once we know the count, our inferences should be based on iV only. For completeness, we 
give the key details of the reference prior algorithm in the Appendix [Bj 

The calculation of reference priors simplifies considerably for posterior densities that are 
asymptotically normal, that is, that become Gaussian as more and more data are included. 
In this case, the reference prior coincides with the Jeffreys' prior |23j, 




where for the single count model the expectation is with respect to the (marginal) likelihood 
p(N\s), given in Eq. ([7]). For a counting experiment, the asymptotic form of the posterior 
density p(s | N) is indeed Gaussian. Therefore, the reference prior for p(N\s) can be computed 





(9) 



using Eq. Adapting the results of Ref. [2T] . we find 




where 




for m — 0, 1, 



(10) 



k=0 



and Vik are the coefficients defined in Eq. ([7]). The complete reference prior, 7r(/x, s), is the 



product of Eqs. (pi) and (10), while the complete reference posterior density is 




(11) 
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The reference posterior density for the expected signal is obtained by integrating over /i, 



more technical details.) 

B. A Measure of Signal Significance 

Assessing the statistical significance of a signal is a standard analysis task in high energy 
physics [11], one which traditionally has been done with a p- value [12]. Here we propose an 
alternative measure that uses the reference posterior density p(n,s\N). 

Suppose we are given some function 5(fi, s) that measures the separation between the 
(composite) background plus signal hypothesis, Hi : 11 > 0, s > 0, and the (composite) 
background-only hypothesis, Hq : fi > 0, s = 0. If the separation between the hypotheses 
were large enough then presumably we would reject the background-only hypothesis in favor 
of the alternative. But, since we know neither the expected background /i nor the expected 
signal s, the natural Bayesian thing to do is to average £(/i, s) with respect to all possible 
hypotheses about the values of fi and s, 



where p(N) is the normalization constant p(N) = J °° ds J °° dfj,p(N\fi, s) 7r(/z, s). If <5(/i, s) 
is interpreted as a loss function then d(N) is a measure of the loss incurred, on average, if 
one were to stubbornly adhere to the background-only hypothesis regardless of the outcome 
of the experiment. A signal is declared to be statistically significant if d(N) > d*, where d* 
is some agreed-upon threshold. Moreover, the decision to accept or reject H Q and thereby 
reject or accept the alternative Hi may be taken independently of any model of new physics. 
There are many possible choices for the function 5(fi, s). We propose to use the Kullback- 




(12) 




([7]) and (10), respectively. (See Appendix |C| for 




(13) 
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Leibler divergence [T9l 120] , 



E p f k \ M j Q 
p(k\fi + s)\n 

P(* I A*) 

= -s + (// + s) ln(l + s/ji), 



(14) 



between the densities p(k\fi + s) and p(k\fi) associated with hypotheses Hi and H Q , respec- 



tively. For fully specified models, Eq. (14) is simply the expected log-likelihood ratio. We 



can gain some insight into 5(fi, s) by considering a counting experiment for which s << fi, 
which characterizes early searches for new physics. In this limit 



5(/x,s) = -s + (s + n) ln(l + 



—s + (s + jj) 



2^ 



+ 



that is, ^2 5(/i, s) ~ s/-JJi. This suggests taking the quantity, 

q = ^/2~d(N), 



\ S 1 
2 n ' 



(15) 



(16) 



as a Bayesian analog of the well-known (and oft-abused) measure of "signal significance," 
q = sj-s/jl. As such, it is an analog of an "n-sigma," that is, the standard re-scaling of a 
p- value using the single tail area of a normal density [12] . This approximate correspondence 
provides a simple calibration of d(N). 



1. Generalization to Multiple Counts 

For an experiment that yields K independent counts, N k , k = 1, • • • ,K, with expected 
background and signal counts and s^, respectively, the KL divergence is simply the sum 

K 

6(jii,si,-'-) = ^2S(ji k ,s k ), (17) 

k=l 
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over terms s&), each of which is given by Eq. (14), while the signal significance measure 
generalizes to 

d(N h ---) = E[8(fii, si, • • • )] 

/•oo /*oo /*oo /*oo 

= / d,Si / • • • / C?Sk / dflK S(fJLi,Si,- • •) 

Jo Jo Jo Jo 



A' 



Jv pOO pOO pOO pOO 

= / rfsi / ^i'" / ^ / dfi K 5(fi k ,s k ) 

k=1 Jo Jo Jo Jo 

X p(JVi|^ l9 Si) -k{ii u s 1 )/p{N 1 ) ■ ■ -viNK^KiSK) 7i(n K ,s K )/p(N K ), 

A 

fc=l 

where we have used the fact that the posterior density p(pi, S\, ■ ■ ■ \Ni, ■ • ■ ) factorizes into a 
product of K terms, one for each count N^, each of which integrates to one. 



III. MULTI-PARAMETER PRIORS AND MODEL RANKING 
A. Multi-Parameter Priors 

We have a well-defined reference posterior density for the signal, p(s\N), which satisfies 

dsp(s\N) = 1. (19) 
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Our task now is to map it to a density p(6) on the parameter space of a given physics 
model. 

By assumption, the model predicts the expected signal s via a predictor function s = f(9). 
Consequently, the reference posterior density p(s\N) induces, or is consistent with, posterior 
densities on 6 that satisfy [22] 

p(s\N) = [ 8[s-f{9))p{9)d9. (20) 
Jo 



Equation (20) is the consistency requirement we alluded to. Note, Eqs. (19) and (20) imply 



that / d9 p{9) = 1. 



Equation (20) determines p{9) only to within a class. Therefore, we need a plausible 



way to choose a specific function from that class that would serve as a suitable posterior 
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density and hence a prior for subsequent analysis. To that end, we note that every point 
9 G A, where A is the image of 5 = (s, s + ds) G R, is associated with the same expected 
signal s G 5. In that sense, the points in A are indistinguishable; that is, A defines a set of 
"look-alike" (LL) models. We therefore propose that p(8) be chosen so that 

every point within A is equiprobable, (21) 

that is, that the density p{6) be constant over A. This choice yields the following expression 
for p(9), 

p(6)=p(s(6)\N)/A(s(6)) } (22) 

where, 

A(s)= [ 6[s-f(9)]d9, (23) 
Je 

is the area of the hyper-surface defined by s — f(6) =0. This choice is arguably the 
simplest for p(8) given that the only information at hand is the reference posterior density 
for the signal. If, however, one has cogent information about how p{9) should vary on these 
hyper-surfaces, then our simple choice can be replaced with something consistent with this 



information and Eq. (20). 

There are two technical challenges in our proposed method. The first is that, in general, 
we do not have explicit functional forms for the mapping s = f(0). In practice, in order 
to calculate the expected signal, we simulate a large number of signal events for a given 
parameter point 8, we apply cuts to these events and we determine what fraction of them 
survive the cuts; that is, we calculate the signal efficiency e(6). Then, for a given integrated 
luminosity £ , we compute the expected signal using s = e{6) a{6) C = f(0), where a (9) is 



the cross section. The second challenge is the calculation of the surface term, Eq. (23). We 



discuss both of these calculations in Sect. IV, in which we illustrate the practical application 



of our method. But first we briefly review the standard Bayesian approach to model ranking. 



B. Model Ranking 

If Nature is kind to us, we shall eventually start to see signals of new physics at the LHC. 
Then, the most important tasks will be to characterize the observations experimentally and 
determine which candidate model best describes them. 
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Suppose we wish to rank M = 1, • • • , J candidate models of new physics according to their 
concordance with the observations. In general, each model will have its own set of parameters 
8m, perhaps differing in meaning and, or, dimensionality. The standard Bayesian approach 
to model ranking is, as usual, direct: calculate the probability of each model M [27] given 
the observations. The model with the highest probability wins. 

Given the likelihood function p{daia\6 m , M) and prior tt(9m,M) = tt(9m\M) vr(M), we 
first compute the evidence [27], 

p(data|M) = J d9 M p(data|0 M , M) n(6 M \M), (24) 

and then the probability of each model 

j 

P(M|data) = p(data|M) tt(M) / p(data|M) vr(M), (25) 

M=l 

where tt(M) is a discrete prior probability distribution over the space of models. The 



polemical aspect of Eq. (25) is the need to specify the values of 7r(M), on which there seems 
little chance of agreement. If, however, the models are judged to be equally implausible — or 
if the LHC experiments were to reach an accord to that effect, it would be appropriate to 



set 7r(M) = 1/M, in which case Eq. (25) reduces to 



P(M|data) = p(data|M)/ p(data|M). (26) 

M=l 

Absent such an accord, it is still possible to rank models using their evidences: the larger 
the evidence the more favored is the model. 

But, there is an important caveat: it is necessary to use proper priors for 7t(6m\M), that 
is, priors that integrate to one. An improper prior is defined only to within an arbitrary 
scale factor. Consequently, were such a prior to be used to compute the evidence, the latter 
would be defined only to within the same arbitrary scale factor. Therefore, in order for the 
evidences to be well-defined, the priors must be proper. By construction, this is the case for 
the multi-dimensional priors introduced above. 

Models can also be ranked using Bayesian reference analysis. However, we defer the 
discussion to a future publication. 
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IV. ILLUSTRATIVE EXAMPLES 



Our proposed method for constructing multi-parameter priors is quite general. It can be 
applied, in principle, to any physics model of any dimensionality provided that the model 
makes a prediction for the parameter of interest, which in our case is the expected signal in 
a counting experiment. For simplicity, however, we illustrate the application of the method 
using a SUSY model with only two free parameters for which the results are easily visualized. 
We then consider two 5-parameter models. 

A. 2-D Model 

The first model we consider is the sub-model of the CMSSM [4] defined by the free 
parameters m , m^, and the fixed parameters tan/3 = 10, A = and /i > 0. We take 
the CMS benchmark point LM1 [5], defined by the fixed parameters mo = 60, m 1( / 2 = 250, 
tan/3 = 10, Aq = and fi > 0, as our true state of Nature (TSN), which provides the 
"observed" count N [28]. For each point in a grid of points in the mo — plane, including 
the point LM1, the SUSY spectrum is calculated using S0FTSUSY 3.1 [29] and sparticle 
decays using SUSYHIT [30J. We generate 1000 7 TeV LHC events using PYTHIA 6.4 [31] 
and approximate the response of the CMS detector [32] to these events using a modified 
version of the fast detector simulation program PGS [33J. We apply a CMS multijets plus 
missing transverse energy {$ T ) event selection [31] to the events simulated at each point 
9 = (mo, mm) and we take the background estimates from the CMS analysis in Ref. [3"4"] . 

Three hypothetical results are considered: i) iV = 3 events observed in C = 1 pb _1 of 
data; ii) iV = 270 events observed in 100 pb -1 , and N = 1335 events observed in 500 pb _1 . 
In each case, we compute the posterior density p(s\N) for the expected signal count at each 
point in the mo — mm plane and map it to the posterior density p(mo, mi/2), which we take 
as the prior 7r(mo, mi/2). The value of the surface term in this case is simply the length of 
the curve s — f(m ,m 1 / 2 ) = 0. 

The plots in Fig. [I] show the induced posterior density p(m ,mi/2), and hence prior 
7r(mo, JTO1/2), for the three integrated luminosities. The plots show several nice features. For 
low statistics, the prior is featureless in the region to which the experiment has no sensitivity, 
while the low mass region is disfavored. At moderate luminosity the prior peaks at the right 
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CMSSM, A Q = 0, tanp = 10, n > CMSSM, A Q = 0, tanp = 10, n > CMSSM, A Q = 0, tanp = 10,n >0 




500 1000 1500 500 1000 1500 500 1000 1500 

m (GeV) m (GeV) m (GeV) 



FIG. 1: Induced posterior densities on the m,n — mi/2 plane for 1 pb 1 (left), 100 pb 1 (center), 
and 500 pb -1 (right). The TSN is indicated by the black dot. 

value, favoring the correct model and, with the same probability, all its LL models. At large 
luminosity the prior converges to the correct LL sub-space A, which, as noted, is a curve. 

The fact that the sub-space is not a single point shows that an infinite amount of data does 
not necessarily guarantee the irrelevance of the prior that initiated the chain of inference. 
This is why choosing the prior carefully is important. Since the LL sub-space A is extended, 
it remains sensitive to the initiating prior, which because of the manner in which we choose 
to map p(s\N) to p(m ,m 1 / 2 ) is constant across the LL sub-space. The upshot of this is 
that we should expect the initiating prior to become irrelevant only if an analysis is able 
to break the model degenaracy so that with an infinite amount of data the LL sub-space 
collapses to a point or, more realistically, to a very small sub-space over which the variation 
of the initiating prior is negligible. 

The degeneracy between models with the same expected signal count — which we argue 
is a desirable property — is intrinsic to the approach we propose. However, having defined a 
prior over the parameter space of the model under study, we can move well beyond a simple 
counting experiment. SUSY models have the virtue of making numerous predictions that 
can be tested in a variety of ways. We argue that the interpretation of data at the LHC 
should be done in a manner that is consistent with all the tested predictions of the model 
under consideration. To do otherwise risks reaching scientifically untenable conclusions: for 
example, that a region of parameter space is still allowed when a more complete analysis 
might say quite the opposite. If we have access to results from different analyses, perhaps 
from different experiments, we argue that a consistent analysis should incorprate these results 
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whenever possible. The ability to do this in a systematic manner is one of our motivations 
for addressing the problem of multi-parameter priors. 

In order to break the model degeneracy, we can incorporate the likelihood associated with 
a set of additional observables x and compute the posterior density p(m ,mi/ 2 |x) using the 
prior ii(mo,mi/2) computed from the single count analysis. An example is given in Fig. |2j 
where the function, 

p(m , m 1/2 \x) oc p(x\m , m 1/2 ) 7r(m , m 1/2 ), (27) 

is shown as a function of mo and m^. We consider the set of measured electroweak observ- 
ables, g-2, BR(b S7), BR(B tu), BR(B Dtu)/BR(B Dev), R m , D s ru, 
D s — > fiu and Ap, for which the likelihood is 

p(X\m Q ,mi/2) oc Y\ Gaussian(Xj|aj, <jj), (28) 

i 

where Xi = aj(m ,mi/ 2 ) is the predicted value of the observable i for the model (m , mi/2), 
which is computed for each observable above using Superlso [35] and micrOMEGAs 2.4 [36] 
and Xi ± Oi is the associated experimental measurement, in which the central value Xi 
is taken as the prediction for our TSN, and the uncertainty <7j is taken from the actual 
measurements quoted by the Particle Data Group [57] . 



CMSSM, A = 0, tanp = 10, n > CMSSM, A Q = 0, tanp = 10, n > CMSSM, A Q = 0, tan|3 = 10, |x >0 




m (GeV) m (GeV) m (GeV) 

FIG. 2: Posterior density induced on the mo — m 1/ / 2 plane, after the inclusion of the electroweak 
observables, for 1 pb _1 (left), 100 pb _1 (center), and 500 pb _1 (right). The TSN is indicated by 
the black dot. The central values of the electroweak observables are computed at the TSN point, 
but we use the experimental uncertainties from Refs. [37.]. 

The plots in Fig. [2] show that the electroweak results are helpful in breaking the model 
degeneracy. We expect this conclusion to remain true for realistic analyses and models. 
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B. 5-D Models 



We now consider two 5-parameter models that illustrate the more realistic situation in 
which the use of a regular grid of parameter points in the space rapidly becomes unfeasible 
due to the well-known "curse of dimensionality". The standard way to circumvent this 
problem is to sample points using Markov Chain Monte Carlo. This is what we propose to 
do in order to approximate the posterior density p(9) where, now, 9 represents a parameter 
point in the 5-dimensional model space. 

1. Models 

We define two non-universal extensions of the CMSSM that we call NUm and NUm 1( / 2 , 
which respectively have non-universal m and non-universal m 1( / 2 . We choose our TSN from 
NUm , and therefore also refer to it as the "TSN model". We refer to the other model as 
the "wrong model". Note that this model cannot be used to parametrize the TSN point 
due to its universal m . The free parameters of the two models and the parameter values at 
TSN are as follows: 

• TSN model: NUm (CMSSM with non-universal m ): 

- m (l,2) : 250 GeV at TSN 

- m (3) = m Hu d : 1.5 TeV at TSN 

- mi/2 where m 1 / 2 = m!/ 2 (l,2) = m 1 / 2 (3) : 300 GeV at TSN 

- A : GeV at TSN 

- tan/3 : 10 at TSN 

• Wrong model: NUm^ (CMSSM with non-universal m^): 

- m where m = m (l, 2) = m (3) = m Hu d 

- m 1/2 (l,2) 

- m 1/2 (3) 

- A 

- tan/3 

For both cases, we take the sign of /i to be positive. 
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2. Priors 



Our method follows the common Bayesian strategy of "sacrificing" a small fraction of 
the data to generate what we have referred to as an initiating prior, that is, a prior that 
permits the inference chain to proceed. In this example, the multi-parameter priors for the 
TSN and wrong models are constructed assuming a 100 pb -1 data-set. We again use the 
SOFTSUSY [29], SUSYHIT [30], PYTHIA [31] sequence to generate events, but Delphes [2] to 
simulate the CMS detector [32], and we apply the same CMS jets plus $ T analysis [33]. For 
simplicity, we assume that the subsequent analysis is again that of a counting experiment 
identical to the one used to construct the priors, except that the integrated luminosity is 
larger. In practice, one would work hard to adapt, improve, and change the analyses as 
more and more data are accumulated. However, our purpose here is not to do a realistic 
analysis but simply to illustrate our method. 

The quantities pertaining to the TSN point, and assuming 100 pb^ 1 are: 



cross section 


a = 


= 1.35 pb, 


signal efficiency 


e = 


= 0.412, 


"observed" count 


N = 


= 169 events, 


background estimate 


A = 


= 113 ± 11.3 events, 


sideband yield 


Y = 


= 100 events, 


sideband/signal region scale factor 


b = 


= 0.889. 



(29) 

The reference prior using the above values for Y and b is shown in Fig. [3} The reference 
posterior density p(s\N) is computed using the numbers at the TSN point. However, since 
it is no longer realistic to use a uniform grid of points, we generate a sample of points 
9i from the reference posterior density p(s\N) with s = f(9), for each model, using the 
Metropolis-Hastings algorithm [38] and multiple MCMC chains. Asymptotically, this sam- 



pling procedure will produce a density that satisfies Eq. (20). Moreover, to the degree that 



the chains can thoroughly explore the surfaces s — f(9) = 0, the generated points will also 



satisfy Eq. (22); that is, the surface term will be automatically incorporated. The mapping 
from one to multiple dimensions is discussed further in Appendix [D] using a 2-dimensional 
toy model. 

Figure |4j shows the 1-dimensional marginal densities of the induced prior for the TSN 
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• prior (numerical) 
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expected signal (s) 



FIG. 3: The reference prior, vr(s), for the single count model computed using Eq. (10) (line) 
compared with the same computed numerically using Eq. Q (points). 



model on which are superimposed the posterior densities. The 1-dimensional marginals 
for the wrong model are shown in Fig. |5j In both figures, the location of the TSN point is 
indicated by the vertical dashed line. Note that in each figure two of the plots are degenerate: 
the 7774/2(1, 2) and 777.1/2(3) plots in Fig.|i]for the TSN model and the 777,0(1, 2) and mn(3) plots 
in Fig. [5] for the wrong model. For the TSN model, most of the peaks of the 1-dimensional 
densities are near the TSN point, while for the wrong model this is not the case. 

We can get a better idea of the shape of the posterior densities from their 2-dimensional 
marginals, which are shown in Fig. [6} The black point in each plot is the TSN point. One 
feature which seems puzzling at first is that the TSN point does not always lie at the peak of 
the densities. But, the following should be noted. If the hyper-surface s — f(6) = on which 
the TSN point lies is larger than that of another hyper-surface associated with a smaller 
value of the reference posterior density p(s\N), then it could happen that the value of p(6) 
on the TSN hyper-surface is actually smaller than its value on the other hyper-surface, even 
though the total probability of the TSN hyper-surface is greater than the total probability 
of other hyper-surfaces. 

Figure [7] shows what happens to the prior after multiplication by the likelihood for the 
electroweak results. As expected, these results make a noticeable change to the prior in 
sharp contrast to the result of the counting experiment. This is, perhaps, not surprising 
since the observed count constrains only the signal strength, whereas the electroweak results 
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TSN model: NUm, 



TSN model: NUm, 



TSN model: NUm, 




FIG. 4: Induced marginal densities for the TSN model assuming a 100 pb . The shaded histograms 
are the priors. The posterior densities, obtained by weighting the sampled points by the likelihood 
for the counting experiment (dark line) and the combined likelihood for the electroweak experiments 
(light line), are superimposed on the priors. The vertical dashed line indicates the position of the 
TSN point. From these projections, one would conclude that the influence of the result of the 
counting experiment is negligible, while the influence of the electroweak results is quite evident. 



constrain multiple observables that help break the model degeneracy. 



3. Signal Significance 



Table |T] shows how the signal significance, as denned in Eq. (13), increases as a function 
of integrated luminosity. We expect this number to scale like ~ y/£, which indeed it does. 



20 



Wrong model: NUm 



Wrong model: NUm 
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FIG. 5: Induced marginal densities for the wrong model. See Fig. [4] for details. 



TABLE I: Signal significance as a function of integrated luminosity for the TSN model. 



Integrated luminosity 
(fb- 1 ) 


"Observed" count (TSN) 
N events 


Significance 


d(N) 




0.5 


331 


12.2 


4.9 


1.0 


387 


13.6 


5.2 


2.0 


660 


19.2 


6.2 


5.0 


1754 


33.2 


8.2 



4- Model Ranking 



As we noted, the purpose of this example is to illustrate the prior construction method. 
However, it is interesting to see what happens if we try to rank the TSN and wrong models 
on the basis of the signal strength only. The results are shown in Table [TTJ We find that 
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FIG. 6: Induced 2-dimensional marginal posterior densities for the TSN model. The TSN is 
indicated by the black dot. See text for details. 
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FIG. 7: Induced 2- dimensional marginal posterior densities for the TSN model including the effect 
of the electroweak results. The TSN is indicated by the black dot. See text for details. 
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even with the relatively weak constraint afforded by merely counting events, we are able to 
rank these models consistently, albeit weakly. 

TABLE II: Ranking of the TSN and wrong models as a function of integrated luminosity. 



Integrated 


Evidence for 


Evidence for 


Evidence TSN over 


luminosity 


TSN model 


wrong model 


Evidence wrong model 


0.5 fb- 1 


0.00253 


0.00205 


1.233 


1.0 fb" 1 


0.00203 


0.00164 


1.235 


2.0 fb" 1 


0.00102 


0.00083 


1.238 


5.0 fb" 1 


0.00034 


0.00028 


1.245 



V. SUMMARY AND CONCLUSIONS 

We have proposed a method for building multi-parameter priors that follows the general 
strategy of building a proper prior using a small portion of the data and analyzing the rest 
using that prior. Since the direct construction of multi-parameter priors, with mathemati- 
cally well-defined properties, is a difficult task we have proposed a method that begins with 
a simpler task, namely, the construction of a reference prior for an analysis having a single 
parameter of interest. Together with the likelihood function, the reference prior yields a 
proper posterior density that is consistent with a class of posterior densities on the param- 
eter space of the physics model under study. We proposed choosing a particular member 
from this class to serve as the multi-parameter prior for subsequent analyses. That prior 
has the property that its density is constant on every hyper-surface indexed by the param- 
eter of interest. Moreover, because it is built from a reference prior, the multi-parameter 
prior is expected to yield credible regions with excellent frequentist properties. Finally, the 
robustness of inferences can be assessed by weighting the multi-parameter prior ir(6) by, for 
example, w(s) = [A(s)/p(s\N)] r and studying the sensitivity of inferences to the exponent 
< r < 1. The exponent r permits a smooth interpolation between the reference prior 
(r = 0) and a flat prior (r = 1). 

Our proposed construction must surmount a technical hurdle: generating a sample of 
points in the parameter space of the physics model with the properties that 1) the number 
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of points on each hyper-surface is proportional to the reference posterior density associated 
with that hyper-surface and 2) the points on the hyper-surface are uniformly distributed. We 
showed, using three illustrative examples, how one might address this question, in general. 
For high-dimensional models, the use of MCMC seems feasible. However, we have found 
that convergence may be an issue because of the severe degeneracies present when relatively 
little information is used to create the multi-parameter prior. In a realistic application it 
will be necessary to tune the MCMC algorithm to ensure convergence of the Markov chains. 
It would be useful to explore different sampling methods, such as MultiNest that may 
be better suited to problems with severe degeneracies. 

In spite of these challenges, however, we have shown that our method yields priors that 
give consistent results as more and more data are accumulated. What remains to be done 
is to apply the method to a real analysis at the LHC. Our expectation is that the method 
would fare well. 

Acknowledgments 

We thank Jim Berger and Jose Bernardo for discussions on reference priors and Bayesian 
methods in general and Sabine Kraml for discussions on the SUSY models. We also thank 
Luc Demortier, Bob Cousins, and Kyle Cranmer for several discussions that helped clarify 
our thoughts. 

This work was supported in part by the U.S. Department of Energy under grant no. 
DE-FG02-97ER41022. 



[1] The Large Hadron Collider, |http : / /lhc . web . cern . ch/lhc 



[2] Delphes, S. Ovyn, X. Rouby, and V. Lemaitre, [arXiv :0903.2225 [hep-ph]]. 

[3] J. Wess, and B. Zumino, Nucl. Phys. B70, 39 (1974); H. P. Miles, Phys. Rept. 110, 1 

(1984); H. Baer, and X. Tata, Weak scale super symmetry: From superfields to scattering 

events (Cambridge University Press, Cambridge, 2006). 
[4] See for example, A. H. Chamseddine, R. L. Arnowitt, and P. Nath, Phys. Rev. Lett. 49, 970 

(1982); G. L. Kane, C. F. Kolda, L. Roszkowski, and J. D. Wells, Phys. Rev. D49, 6173 

(1994). [hep-ph/9312272] . 

25 



[5] A. A. Markov, Izvestiya Fiziko-matematicheskogo obschestva pri Kazanskom universitete, 2-ya 
seriya, torn 15, 135 (1906); A. A. Markov, reprinted in Appendix B of R. Howard, Dynamic 
Probabilistic Systems, Vol. 1: Markov Chains (John Wiley and Sons, 1971). For a modern 
textbook introduction see, for example, B. A. Berg, Markov Chain Monte Carlo Simulations 
And Their Statistical Analysis (World Scientific, Singapore, 2004). 

[6] See for example, E. A. Baltz, P. Gondolo, JHEP 0410, 052 (2004), |arXiv:hep-ph/0407039 



[hep-ph]]; C. G. Lester, M. A. Parker, M. J. White, 2, JHEP 0601, 080 (2006), | hep] 
|ph/0508143 ]; R. R. de Austri, R. Trotta, L. Roszkowski, JHEP 0605, 002 (2006). [hep] 
ph/0602028]; E. A. Baltz, M. Battaglia, M. E. Peskin, T. Wizansky, Phys. Rev. D74, 103521 
(2006). |hep-ph/0602187]; B. C. Allanach, C. G. Lester, A. M. Weber, JHEP 0612, 065 



(2006). [hep-ph/ 0609295|; B. C. Allanach, C. G. Lester, Comput. Phys. Commun. 179, 256 



(2008). |arXiv:0705.0486 [hep-ph]]; L. M. H. Hall, H. V. Peiris, JCAP 0801, 027 (2008). 
|arXiv: 0709 .29121 [astro-ph]]; S. Davidson, J. Garayoa, F. Palorini, N. Rius, JHEP 0809, 
053 (2008). |arXiv:0806.2"832"l [hep-ph]]; H. Baer, S. Kraml, S. Sekmen, H. Summy, JHEP 
0803, 056 (2008). |arXiv:0801.1831| [hep-ph]]; O. Buchmueller, R. Cavanaugh, A. De Roeck, 
J. R. Ellis, H. Flacher, S. Heinemeyer, G. Isidori, K. A. Olive et al, JHEP 0809, 117 (2008). 
[arXiv:0808.4128l [hep-ph]]. F. Brummer, S. Fichet, S. Kraml, R. K. Singh, JHEP 1008, 096 
(2010). |arXiv:1007.032l"1 [hep-ph]]; H. Baer, S. Kraml, A. Lessa, S. Sekmen, X. Tata, JHEP 



1010, 018 (2010). | arXiv: 1007.3897 [hep-ph]]. 

[7] C. F. Berger, J. S. Gainer, J. L. Hewett, and T. G. Rizzo, JHEP 0902, 023 (2009). 

[8] G. L. Bayatian et al. [ CMS Collaboration ], J. Phys. G G34, 995 (2007). 

[9] See for example, O. Buchmueller, R. Cavanaugh, A. De Roeck, J. R. Ellis, H. Flacher, S. Heine- 
meyer, G. Isidori, K. A. Olive et al, Eur. Phys. J. C64, 391(2009), |arXiv: 0907. 5568] [hep-ph]] ; 
O. Buchmueller, R. Cavanaugh, D. Colling, A. De Roeck, M. J. Dolan, J. R. Ellis, H. Flacher, 
S. Heinemeyer et al, Eur. Phys. J. C71, 1583 (2011). [arXiv: 101 1.61181 [hep-ph]]. 
[10] See for example, D. E. Lopez-Fogliani, L. Roszkowski, R. R. de Austri, T. A. Varley, Phys. 



Rev. D80, 095013 (2009). | arXiv:0906. 4911 [hep-ph]]; R. Trotta, F. Feroz, M. P. Hobson, 
L. Roszkowski, R. Ruiz de Austri, JHEP 0812, 024 (2008), [arXiv:0809.3"792"l [hep-ph]]; 
B. C. Allanach, K. Cranmer, C. G. Lester, and A. M. Weber, JHEP 08, 023 (2007). 

[11] R. D. Cousins, J. T. Linnemann, and J. Tucker, Nucl. Instrum. Meth. A595, 480 (2008). 

[12] G. Cowan, K. Cranmer, E. Gross, and O. Vitells, Eur. Phys. J. C71, 1554 (2011). 



26 



arXiv: 1007. 1727 [physics. data-an]]. 



[13] F. Feroz, K. Cranmer, M. Hobson, R. Ruiz de Austri, and R. Trotta, JHEP 1106, 042 (2011). 



[arXiv: 1101 .3296] [hep-ph]]. 
[14] Y. Akrami, P. Scott, J. Edsjo, J. Conrad, and L. Bergstrom, JHEP 1004, 057 (2010). 



|arXiv:0910.3950 [hep-ph]]. 

[15] C. P. Robert, The Bayesian Choice: from Decision- Theoretic Foundations to Computational 
Implementation (Springer, New York, 2007), 2nd ed.; E. T. Jaynes, Probability Theory: The 
Logic of Science, edited by G. L. Bretthorst (Cambridge University Press, Cambridge, 2003); 
A. O'Hagan, Kendall's Advanced Theory of Statistics, Volume 2B: Bayesian Inference (Ed- 
ward Arnold, London, 1994); H. Jeffreys, Theory of Probability (Oxford University Press, 
Oxford, 1961), 3rd ed. 

[16] V. M. Abazov et al. (DO Collaboration), Phys. Rev. Lett. 103, 092001 (2009). 

[17] T. Aaltonen et al. (CDF Collaboration), Phys. Rev. Lett. 103, 092002 (2009). 

[18] I. Bertram, G. Landsberg, J. Linnemann, R. Partridge, M. Paterno, and H. B. Prosper, 
Fermilab preprint FERMILAB-TM-2 104 (2000). 

[19] J. M. Bernardo, J. R. Statist. Soc. B 41, 113 (1979); J. O. Berger and J. M. Bernardo, J. 
Amer. Statist. Assoc. 84, 200 (1989); J. O. Berger and J. M. Bernardo, Biometrika 79, 25 
(1992); J. O. Berger and J. M. Bernardo, in Bayesian Statistics 4, edited by J. M. Bernardo, 
J. O. Berger, A. P. Dawid, and A. F. M. Smith (Oxford University Press, Oxford, 1992), pp. 35- 



60, http://www.uv.es/~bernardo/1992Valencia4Ref.pdf; J. M. Bernardo, in Handbook of 



Statistics 25, edited by D. K. Dey and C. R. Rao (Elsevier, Amsterdam, 2005), pp. 17-90, 
http : / / www . uv . es/~bernardo/Ref Ana . pdf , 
[20] L. Demortier, in Statistical Problems in Particle Physics, Astrophysics, and Cosmology: Pro- 
ceedings of PHYSTAT05, Eds. L. Lyons and M. K. Unel (Imperial College Press, London, 
2006), pp. 11-14. 

[21] L. Demortier, S. Jain, and H. B. Prosper, Phys. Rev. D 82, 034002 (2010). 
[22] D. T. Gillespie, Am. J. Phys. 51, 520 (1983). 



[23] J. O. Berger, J. M. Bernardo, and D. Sun, Ann. Statist. 37, 905 (2009), http://www.uv.es/ 



-bernardo/2009Annals . pdf 



[24] F. Feroz, K. Cranmer, M. Hobson, R. Ruiz de Austri, and R. Trotta, JHEP 1106, 042 (2011). 



[arXiv:1101.3296| [hep-ph]]. 

27 



[25] I.J. Myung, V. Balasubramanian, and M.A. Pitt, Proc. Natl. Acad. Sci. USA, 97, 11170 

(2000) ; |http : //www . ncbi . nlm . nih . gov/pmc/articles/PMC17172 
[26] J.O. Berger, and R.L. Wolpert, The likelihood principle, Lecture Notes-Monograph Series, 

Vol. 6, Ed. S.S. Gupta (Institute of Mathematical Statistics, Hayward, 1984). 
[27] See, for example, D. J. C. Mackay, Bayesian Methods for Adaptive Models, PhD Thesis, 



Caltech (1992). http : //www. inference .phy . cam. ac .uk/mackay /PhD .html 



[28] M. Pierini, H. Prosper, S. Sekmen, and M. Spiropulu, [arXiv:1107.2877| [hep-ph]]. 

[29] SOFTSUSY, B. C. Allanach, Comput. Phys. Commun. 143, 305 (2002). [hep-ph/0104145] . 

[30] SUSYHIT, A. Djouadi, M. M. Muhlleitner, and M. Spira, Acta Phys. Polon. B38, 635 (2007). 



|hep-ph/0609292|. 

[31] PYTHIA, T. Sjostrand, S. Mrenna, and P. Z. Skands, JHEP 0605, 026 (2006). [hep-ph/0603175] . 
[32] R. Adolphi et al. [ CMS Collaboration ], JINST 3, S08004 (2008). 
[33] PGS, J. Conway, et al, 

http : //physics .ucdavis . edu/~conway/research/ sof tware/pgs/pgs4-general .htm, 



[34] S. Sekmen, Ph.D. Thesis, CMS TS-2009/025. 

[35] Superlso, F. Mahmoudi, Comput. Phys. Commun. 178, 745 (2008). |a rXiv:0710.2067| [hep- 



ph]]; F. Mahmoudi, CPHCB,180,1579-1613. 2009 180, 1579 (2009). [arXiv:0808.3144| [hep-ph]] . 
[36] micrOMEGAs, G. Belanger, F. Boudjema, A. Pukhov, A. Semenov, Comput. Phys. Commun. 

176, 367 (2007). |hep-ph/0607059|. 
[37] K. Nakamura et al. [ Particle Data Group Collaboration ], J. Phys. G G37, 075021 (2010). 
[38] N. Metropolis, A. W. Rosenbluth, M. N. Rosenbluth, A. H. Teller, and E. Teller, J. Chem. 

Phys. 21, 1087 (1953); W. K. Hastings, Biometrika 57, 1970 (1970). 
[39] MultiNest, F. Feroz, M. P. Hobson, and M. Bridges, [arXiv:0809.3437| [astro-ph]]. 
[40] In this limit — essentially, when the two hypotheses H\ and Hq are nearly degenerate — the 

KL divergence can be interpreted as twice the square of the distance between the associated 

densities in the space of functions [25] . 



Appendix A: Derivation of Background Prior 

This form for the prior 7r(/z) can be motivated [21] by considering an experiment compris- 
ing two data-sets S and B. Data-set S is modeled as a mixture of signal and background 
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events with expected background count fi. Data-set B, perhaps a sideband, is presumed 
to be overwhelmingly dominated by background events with expected background bp,. Al- 
though we do not know p, we assume that we know the ratio b of the expected background 
in data-set B to that in data-set S. The expected background b\x for data-set B is estimated 
by the number of events Y in that data-set. The likelihood for the observed count Y in 
data-set B is taken to be Poisson(y \bp), which, together with its reference prior, oc 1/^/p, 
yields the posterior density p(p\Y) oc exp(—bp)(bp) Y ~ 1 / 2 . This posterior density serves as 
the evidence-based prior tc(p) for the expected background in data-set S. 



Appendix B: Definition of Reference Prior for the Single Count Model 

One begins with the information gained from K repetitions of the single count experiment, 

oo oo 

I k \k] = ■■■ E rn(N( K ))D[K,p(s\N (K) )}, (Bl) 

iVi=0 N K =0 

where 

m(N {K) ) = J p(N( K) \s)ir(s)ds, 

K 

with p(N (K) \s) = lipids), (B2) 

i=l 

is the marginal density for K experiments. The maximization of the expected information 
gain, li<-[7r], with respect to the prior yields the function ttk{s). By definition [23], the 
reference prior ir(s) is the limit 

7r s = hm 



K^oo TT K (S ) 

p{N (K) \s)h(t 



with n K (s) = exp < ■■■ p(N(iq\s) In 

.7V 1= N K =Q 



J p(N (K) \s) h(s) ds 



(B3) 



where so is any fixed point in the space of expected signal and h(s) is any positive func- 
tion, such as h(s) = 1. However, since the posterior density for the single count model is 
asymptotically normal, the reference prior computed using the above algorithm coincides 
with Jeffreys prior, Eq. 
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Appendix C: Calculation of Marginal Likelihood 



Defining the recursive functions, 
W (s,z) = 1, 

W k (s,z) = z(£j W fc _! for k=l, 
Y (z) = 1, 

y- | + ^ 1 



(CI) 



nw^l^ — JU+T ,n_1 ' for fc = 1. 



we can write p(n | s) and T™(s) as 



^^(s,z)F n _ fc (z), (C2) 

fc=0 



6+1 

rc 

T m = ^A; m ^ fc ( S ,z)F n _ fc (z), (C3) 

fc=0 

with z = 1 for n = and z = e~ s l n for n > 0. 

Appendix D: Mapping Procedure for a 2D Toy Model 

To illustrate further how the mapping from a 1-D posterior density to an n-D parameter 
space works in practice, we consider the case of a model described by two unknown parame- 
ters x and y. An experimental measurement is available for the quantity p = \Jx 2 + y 2 . One 
builds the reference prior corresponding to all the possible outcomes of the measurement 
of p and derives a reference posterior p(p). We now want to find a function n(x,y) that is 
consistent with the 1-D reference posterior density p(p). 

To solve this problem, we impose two conditions: 

• 7r(x,y) is constant for all the points (x,y) corresponding to the same value of p. 
This implies that 7r(x,y) = 7r(p(x,y)). This makes perfect sense because the only 
available information on x and y is the measurement of p, which cannot break the 
degeneracy of the iso-p contour. Without any loss of generality, we can then write 

n(x,y) = p(p(x,y))/A(p(x,y)); 
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FIG. 8: (left) Induced posterior density p'(x,y) = p(p(x,y)), where p(p) is the 1-D reference 



posterior density, (right) Ratio of p'(x,y) marginalized back to p, via Eq. (20), over the reference 
posterior density p(p). Clearly the two 1-D densities are not the same, as they should be if the 
density p'(x, y) were consistent with p(p). 



• when marginalized to p, through Eq. (20 ), tt(x, y) should recover p(p). This consistency 
requirement, together with the first, is what permits identifying A(p(x,y)) with the 
"area" of the iso-p contour. 

The first requirement is quite natural if one thinks of the Bayesian analysis as an update 
of our knowledge about the parameters x and y. The second requirement may need further 
explanation. 

Suppose for the moment that the function A(p(x, y)) does not enter the problem. Enforc- 
ing the first condition would then imply that n(x,y) = p(p(x,y)). Consider a measurement 
of p with a Gaussian likelihood. This measurement would translate into a 2-D function 
of x and y as shown in the left plot of Fig. [8] Once marginalized, this function gives a 
function g(p) which differs from p(p) by a factor linear in p, coming from the Jacobian of 
the (x, y) — > p marginalization. This is shown in the right plot of Fig. [8j which shows the 
ratio g(p)/p(p) as a function of p. 

However, in this specific case, we know the form of the function A(p); it is simply given by 
A(p) = 2irp. Therefore, the correct mapping from 1-D to 2-D yields 7v(x, y) = p(p(x, y))/2irp, 
shown in the left plot of Fig. [9j which gives a constant value for the ratio g(p)/p(p) (see 
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FIG. 9: (left) Induced posterior density p'(x,y) = p(p(x,y))/27rp, where p(p) is the 1-D reference 



posterior density, (right) Ratio of p'(x,y) marginalized back to p, via Eq. (20), over the reference 
posterior density p(p). The two 1-D densities are identical, as they should be since, by construction, 
the density p'(x,y) is consistent with p(p). 

right plot of Fig. [9]) as one would expect for a density 7t(x,y) that is consistent with p{p). 

In the absence of an analytical solution for A(x,y), one could follow a simple numerical 
procedure, which takes full advantage of the fact that A(x,y) = A(p(x,y)). This simple 
fact implies that, by incorrectly using p'(x,y) = p{p{x,y)) one is wrong by a factor that 
is constant over the iso-p contour. This factor is nothing else than the ratio g{p)/p{p), 



mapped onto the (x,y) plane (see left plot of Fig. 10). This simple construction allows one 



to solve for the integral, Eq. (23), denning A(x,y) without having to perform the integral 



explicitly; one simply weights each point by g{p)/p{p), which is shown in the right-hand 



plot of Fig. 10). When the corrected function Tc(x,y) is marginalized, the function p(p) is 
recovered by construction. 

The use of MCMC to sample the space x, y makes the procedure even simpler. Rather 
than scanning the (x, y) plane and associating to each point the value of p(p), one samples 
(x,y) according to p(p) directly. This implies that g(p) = p(p) by construction, as one can 
easily verify. 
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FIG. 10: (left) Correction map in the x, y plane and (right) the same map in the p space. 
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